Efficient Distribution Mining and Classification
نویسندگان
چکیده
We define and solve the problem of “distribution classification”, and, in general, “distribution mining”. Given n distributions (i.e., clouds) of multi-dimensional points, we want to classify them into k classes, to find patterns, rules and out-lier clouds. For example, consider the 2-d case of sales of items, where, for each item sold, we record the unit price and quantity; then, each customer is represented as a distribution/cloud of 2-d points (one for each item he bought). We want to group similar users together, e.g., for market segmentation, anomaly/fraud detection. We propose D-Mine to achieve this goal. Our main contribution is Theorem 3.1, which shows how to use wavelets to speed up the cloud-similarity computations. Extensive experiments on both synthetic and real multidimensional data sets show that our method achieves up to 400 faster wall-clock time over the naive implementation, with comparable (and occasionally better) classification quality.
منابع مشابه
Efficient Data Mining with Evolutionary Algorithms for Cloud Computing Application
With the rapid development of the internet, the amount of information and data which are produced, are extremely massive. Hence, client will be confused with huge amount of data, and it is difficult to understand which ones are useful. Data mining can overcome this problem. While data mining is using on cloud computing, it is reducing time of processing, energy usage and costs. As the speed of ...
متن کاملImproving reservoir rock classification in heterogeneous carbonates using boosting and bagging strategies: A case study of early Triassic carbonates of coastal Fars, south Iran
An accurate reservoir characterization is a crucial task for the development of quantitative geological models and reservoir simulation. In the present research work, a novel view is presented on the reservoir characterization using the advantages of thin section image analysis and intelligent classification algorithms. The proposed methodology comprises three main steps. First, four classes of...
متن کاملResources classification using fractal modelling in Eastern Kahang Cu-Mo porphyry deposit, Central Iran
Resources/reserves classification is crucial for block model creation utilised in mine planning and feasibility study. Selection of estimation methods is an essential part of mineral exploration and mining activities. In other word, resources classification is an issue for mining companies, investors, financial institutions and authorities, but it remains subject to some confusion. The aim of t...
متن کاملAn Efficient Representation Model of Distance Distribution Between Two Uncertain Objects
In this paper, we consider the problem of efficient computation of distance distribution between two uncertain objects. It is important to many uncertain query evaluation (e.g., range queries, nearest-neighbour queries) and uncertain data mining (e.g., classification, clustering and outlier detection). However, existing approaches involve distance computations between samples of two objects, wh...
متن کاملEvaluating the effect of using different reference spectra on SAM classification results: an implication for hydrothermal alteration mapping
This research was performed with the objective of evaluating the accuracy of spectral angle mapper (SAM) classification using different reference spectra. The Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) digital images were applied in the SAM classification in order to map the distribution of hydrothermally altered rocks in the Kerman Cenozoic magmatic arc (KCMA), Iran...
متن کاملEstimation of reliability-based maintenance time intervals of Load-Haul-Dumper in an underground coal mine
Reliability estimation plays a significant role in the performance assessment of mining equipment, and aids in designing efficient and effective preventive maintenance strategies. Continuous and random/irregular occurrence of failures in a system could be the main cause for performance drop of machinery. The accomplishment of a projected level of production is possible only by an efficient oper...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008